Overview

Brought to you by YData

Dataset statistics

Number of variables11
Number of observations14776615
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.7 GiB
Average record size in memory271.0 B

Variable types

Text2
Categorical3
DateTime1
Numeric5

Alerts

fare_amount is highly overall correlated with mta_tax and 1 other fieldsHigh correlation
mta_tax is highly overall correlated with fare_amount and 1 other fieldsHigh correlation
tolls_amount is highly overall correlated with mta_taxHigh correlation
total_amount is highly overall correlated with fare_amountHigh correlation
payment_type is highly imbalanced (55.6%)Imbalance
mta_tax is highly imbalanced (96.9%)Imbalance
surcharge has 7596039 (51.4%) zerosZeros
tip_amount has 7236560 (49.0%) zerosZeros
tolls_amount has 14189538 (96.0%) zerosZeros

Reproduction

Analysis started2025-10-28 01:17:10.760873
Analysis finished2025-10-28 01:20:05.551553
Duration2 minutes and 54.79 seconds
Software versionydata-profiling vv4.17.0
Download configurationconfig.json

Variables

Distinct13426
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.1 GiB
2025-10-28T12:20:05.845559image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters472851680
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)< 0.1%

Sample

1st row89D227B655E5C82AECF13C3F540D4CF4
2nd row0BD7C8F5BA12B88E0B67BED28BEA73D8
3rd row0BD7C8F5BA12B88E0B67BED28BEA73D8
4th rowDFD2202EE08F7A8DC9A57B02ACB81FE2
5th rowDFD2202EE08F7A8DC9A57B02ACB81FE2
ValueCountFrequency (%)
7e1346f23960cc18d7d129fa28b63a752137
 
< 0.1%
6ffcf7a4f34ba44239636028e680e4382112
 
< 0.1%
a979cda04cfb8ba3d3acba7e8d7f06612039
 
< 0.1%
d5c7cd37ea4d372d00f0a681cdc93f111959
 
< 0.1%
849e486825860106403fb991a763bcc31957
 
< 0.1%
6fe6dff9a59c0b64be0ca64ee2699f081941
 
< 0.1%
06c961ebe7ef4d13f3ae6c005ee0f5011893
 
< 0.1%
22908753e00888cc219c875c8d5bc4f61886
 
< 0.1%
e6101a0f85312c49a5b5950e61d284dc1882
 
< 0.1%
6403bf98e4618e21c795c3b45a636d771882
 
< 0.1%
Other values (13416)14756927
99.9%
2025-10-28T12:20:06.199889image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E29839347
 
6.3%
429833374
 
6.3%
A29731461
 
6.3%
929695897
 
6.3%
529640335
 
6.3%
F29609660
 
6.3%
D29606522
 
6.3%
729545983
 
6.2%
629539893
 
6.2%
229522375
 
6.2%
Other values (6)176286833
37.3%

Most occurring categories

ValueCountFrequency (%)
(unknown)472851680
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E29839347
 
6.3%
429833374
 
6.3%
A29731461
 
6.3%
929695897
 
6.3%
529640335
 
6.3%
F29609660
 
6.3%
D29606522
 
6.3%
729545983
 
6.2%
629539893
 
6.2%
229522375
 
6.2%
Other values (6)176286833
37.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown)472851680
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E29839347
 
6.3%
429833374
 
6.3%
A29731461
 
6.3%
929695897
 
6.3%
529640335
 
6.3%
F29609660
 
6.3%
D29606522
 
6.3%
729545983
 
6.2%
629539893
 
6.2%
229522375
 
6.2%
Other values (6)176286833
37.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown)472851680
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E29839347
 
6.3%
429833374
 
6.3%
A29731461
 
6.3%
929695897
 
6.3%
529640335
 
6.3%
F29609660
 
6.3%
D29606522
 
6.3%
729545983
 
6.2%
629539893
 
6.2%
229522375
 
6.2%
Other values (6)176286833
37.3%
Distinct32224
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 GiB
2025-10-28T12:20:06.549540image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters472851680
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique182 ?
Unique (%)< 0.1%

Sample

1st rowBA96DE419E711691B9445D6A6307C170
2nd row9FD8F69F0804BDB5549F40E9DA1BE472
3rd row9FD8F69F0804BDB5549F40E9DA1BE472
4th row51EE87E3205C985EF8431D850C786310
5th row51EE87E3205C985EF8431D850C786310
ValueCountFrequency (%)
00b7691d86d96aebd21dd9e138f908401933
 
< 0.1%
f49fd0d84449ae7f72f3bc492cd6c7541616
 
< 0.1%
51c1be97280a80ebfa8dad34e1956cf61603
 
< 0.1%
847349f8845a667d9ac7cdedd1c873cb1570
 
< 0.1%
ce625fd96d0fafc812a6957139b354a11557
 
< 0.1%
3d757e111c78f5cac83d44a92885d4901514
 
< 0.1%
22ca618759c716436ea3257480199a321501
 
< 0.1%
3aab94ca53fe93a64811f656906546491486
 
< 0.1%
e66e58207128619cff2d2e2c3c7ecc081442
 
< 0.1%
c9674190984ba193ffd8ddcc019804cf1390
 
< 0.1%
Other values (32214)14761003
99.9%
2025-10-28T12:20:07.139882image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C29776443
 
6.3%
E29713235
 
6.3%
829655194
 
6.3%
529608188
 
6.3%
029603323
 
6.3%
D29589934
 
6.3%
329585835
 
6.3%
729576279
 
6.3%
F29553071
 
6.2%
B29539186
 
6.2%
Other values (6)176650992
37.4%

Most occurring categories

ValueCountFrequency (%)
(unknown)472851680
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C29776443
 
6.3%
E29713235
 
6.3%
829655194
 
6.3%
529608188
 
6.3%
029603323
 
6.3%
D29589934
 
6.3%
329585835
 
6.3%
729576279
 
6.3%
F29553071
 
6.2%
B29539186
 
6.2%
Other values (6)176650992
37.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown)472851680
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C29776443
 
6.3%
E29713235
 
6.3%
829655194
 
6.3%
529608188
 
6.3%
029603323
 
6.3%
D29589934
 
6.3%
329585835
 
6.3%
729576279
 
6.3%
F29553071
 
6.2%
B29539186
 
6.2%
Other values (6)176650992
37.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown)472851680
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C29776443
 
6.3%
E29713235
 
6.3%
829655194
 
6.3%
529608188
 
6.3%
029603323
 
6.3%
D29589934
 
6.3%
329585835
 
6.3%
729576279
 
6.3%
F29553071
 
6.2%
B29539186
 
6.2%
Other values (6)176650992
37.4%

vendor_id
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size732.8 MiB
CMT
7450899 
VTS
7325716 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters44329845
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCMT
2nd rowCMT
3rd rowCMT
4th rowCMT
5th rowCMT

Common Values

ValueCountFrequency (%)
CMT7450899
50.4%
VTS7325716
49.6%

Length

2025-10-28T12:20:07.227628image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-10-28T12:20:07.273300image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
cmt7450899
50.4%
vts7325716
49.6%

Most occurring characters

ValueCountFrequency (%)
T14776615
33.3%
C7450899
16.8%
M7450899
16.8%
V7325716
16.5%
S7325716
16.5%

Most occurring categories

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
T14776615
33.3%
C7450899
16.8%
M7450899
16.8%
V7325716
16.5%
S7325716
16.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
T14776615
33.3%
C7450899
16.8%
M7450899
16.8%
V7325716
16.5%
S7325716
16.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
T14776615
33.3%
C7450899
16.8%
M7450899
16.8%
V7325716
16.5%
S7325716
16.5%
Distinct2303465
Distinct (%)15.6%
Missing0
Missing (%)0.0%
Memory size112.7 MiB
Minimum2013-01-01 00:00:00
Maximum2013-01-31 23:59:59
Invalid dates0
Invalid dates (%)0.0%
2025-10-28T12:20:07.328165image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:20:07.396511image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

payment_type
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.1 MiB
CRD
7743844 
CSH
6982383 
NOC
 
32783
DIS
 
11171
UNK
 
6434

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters44329845
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCSH
2nd rowCSH
3rd rowCSH
4th rowCSH
5th rowCSH

Common Values

ValueCountFrequency (%)
CRD7743844
52.4%
CSH6982383
47.3%
NOC32783
 
0.2%
DIS11171
 
0.1%
UNK6434
 
< 0.1%

Length

2025-10-28T12:20:07.456039image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-10-28T12:20:07.505357image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
crd7743844
52.4%
csh6982383
47.3%
noc32783
 
0.2%
dis11171
 
0.1%
unk6434
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
C14759010
33.3%
D7755015
17.5%
R7743844
17.5%
S6993554
15.8%
H6982383
15.8%
N39217
 
0.1%
O32783
 
0.1%
I11171
 
< 0.1%
U6434
 
< 0.1%
K6434
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C14759010
33.3%
D7755015
17.5%
R7743844
17.5%
S6993554
15.8%
H6982383
15.8%
N39217
 
0.1%
O32783
 
0.1%
I11171
 
< 0.1%
U6434
 
< 0.1%
K6434
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C14759010
33.3%
D7755015
17.5%
R7743844
17.5%
S6993554
15.8%
H6982383
15.8%
N39217
 
0.1%
O32783
 
0.1%
I11171
 
< 0.1%
U6434
 
< 0.1%
K6434
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C14759010
33.3%
D7755015
17.5%
R7743844
17.5%
S6993554
15.8%
H6982383
15.8%
N39217
 
0.1%
O32783
 
0.1%
I11171
 
< 0.1%
U6434
 
< 0.1%
K6434
 
< 0.1%

fare_amount
Real number (ℝ)

High correlation 

Distinct1417
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.664722
Minimum2.5
Maximum500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size112.7 MiB
2025-10-28T12:20:07.563239image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum2.5
5-th percentile4.5
Q16.5
median9
Q313
95-th percentile30
Maximum500
Range497.5
Interquartile range (IQR)6.5

Descriptive statistics

Standard deviation9.6392187
Coefficient of variation (CV)0.82635649
Kurtosis48.307291
Mean11.664722
Median Absolute Deviation (MAD)3
Skewness4.0728317
Sum1.7236511 × 108
Variance92.914538
MonotonicityNot monotonic
2025-10-28T12:20:07.627915image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6813197
 
5.5%
6.5806688
 
5.5%
5.5793987
 
5.4%
7782107
 
5.3%
7.5748123
 
5.1%
5725076
 
4.9%
8702986
 
4.8%
8.5655601
 
4.4%
9604185
 
4.1%
4.5592918
 
4.0%
Other values (1407)7551747
51.1%
ValueCountFrequency (%)
2.561249
0.4%
2.551
 
< 0.1%
2.62
 
< 0.1%
2.691
 
< 0.1%
2.76
 
< 0.1%
2.751
 
< 0.1%
2.84
 
< 0.1%
2.823
 
< 0.1%
2.833
 
< 0.1%
2.851
 
< 0.1%
ValueCountFrequency (%)
50010
< 0.1%
479.41
 
< 0.1%
476.661
 
< 0.1%
4754
 
< 0.1%
4707
< 0.1%
468.51
 
< 0.1%
4651
 
< 0.1%
4601
 
< 0.1%
450.011
 
< 0.1%
45010
< 0.1%

surcharge
Real number (ℝ)

Zeros 

Distinct23
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.32049042
Minimum0
Maximum12.5
Zeros7596039
Zeros (%)51.4%
Negative0
Negative (%)0.0%
Memory size112.7 MiB
2025-10-28T12:20:07.964953image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.5
95-th percentile1
Maximum12.5
Range12.5
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.36757414
Coefficient of variation (CV)1.1469115
Kurtosis-0.60733
Mean0.32049042
Median Absolute Deviation (MAD)0
Skewness0.68750232
Sum4735763.5
Variance0.13511075
MonotonicityNot monotonic
2025-10-28T12:20:08.023113image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
07596039
51.4%
0.54890084
33.1%
12290231
 
15.5%
1.5190
 
< 0.1%
226
 
< 0.1%
2.513
 
< 0.1%
37
 
< 0.1%
0.414
 
< 0.1%
0.823
 
< 0.1%
3.53
 
< 0.1%
Other values (13)15
 
< 0.1%
ValueCountFrequency (%)
07596039
51.4%
0.021
 
< 0.1%
0.051
 
< 0.1%
0.081
 
< 0.1%
0.11
 
< 0.1%
0.151
 
< 0.1%
0.414
 
< 0.1%
0.54890084
33.1%
0.823
 
< 0.1%
12290231
 
15.5%
ValueCountFrequency (%)
12.52
 
< 0.1%
101
 
< 0.1%
8.51
 
< 0.1%
82
 
< 0.1%
7.51
 
< 0.1%
71
 
< 0.1%
61
 
< 0.1%
51
 
< 0.1%
3.53
< 0.1%
37
< 0.1%

mta_tax
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size732.8 MiB
0.5
14729241 
0.0
 
47374

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters44329845
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.5
2nd row0.5
3rd row0.5
4th row0.5
5th row0.5

Common Values

ValueCountFrequency (%)
0.514729241
99.7%
0.047374
 
0.3%

Length

2025-10-28T12:20:08.079406image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-10-28T12:20:08.124683image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0.514729241
99.7%
0.047374
 
0.3%

Most occurring characters

ValueCountFrequency (%)
014823989
33.4%
.14776615
33.3%
514729241
33.2%

Most occurring categories

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
014823989
33.4%
.14776615
33.3%
514729241
33.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
014823989
33.4%
.14776615
33.3%
514729241
33.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown)44329845
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
014823989
33.4%
.14776615
33.3%
514729241
33.2%

tip_amount
Real number (ℝ)

Zeros 

Distinct2768
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2675086
Minimum0
Maximum200
Zeros7236560
Zeros (%)49.0%
Negative0
Negative (%)0.0%
Memory size112.7 MiB
2025-10-28T12:20:08.178748image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.8
Q32
95-th percentile4.75
Maximum200
Range200
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.0460844
Coefficient of variation (CV)1.6142568
Kurtosis177.72682
Mean1.2675086
Median Absolute Deviation (MAD)0.8
Skewness6.0812817
Sum18729486
Variance4.1864613
MonotonicityNot monotonic
2025-10-28T12:20:08.247893image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07236560
49.0%
11342116
 
9.1%
2715709
 
4.8%
1.5595019
 
4.0%
3233031
 
1.6%
2.5192926
 
1.3%
1.8183703
 
1.2%
1.4177746
 
1.2%
1.2174359
 
1.2%
1.6174284
 
1.2%
Other values (2758)3751162
25.4%
ValueCountFrequency (%)
07236560
49.0%
0.013403
 
< 0.1%
0.021079
 
< 0.1%
0.03428
 
< 0.1%
0.04173
 
< 0.1%
0.05890
 
< 0.1%
0.06194
 
< 0.1%
0.07182
 
< 0.1%
0.08776
 
< 0.1%
0.09283
 
< 0.1%
ValueCountFrequency (%)
2004
< 0.1%
1971
 
< 0.1%
187.751
 
< 0.1%
182.451
 
< 0.1%
180.31
 
< 0.1%
1771
 
< 0.1%
1661
 
< 0.1%
1651
 
< 0.1%
1611
 
< 0.1%
1601
 
< 0.1%

tolls_amount
Real number (ℝ)

High correlation  Zeros 

Distinct502
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.20186698
Minimum0
Maximum20
Zeros14189538
Zeros (%)96.0%
Negative0
Negative (%)0.0%
Memory size112.7 MiB
2025-10-28T12:20:08.317682image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum20
Range20
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.0354807
Coefficient of variation (CV)5.1295198
Kurtosis43.02759
Mean0.20186698
Median Absolute Deviation (MAD)0
Skewness5.8594753
Sum2982910.6
Variance1.0722202
MonotonicityNot monotonic
2025-10-28T12:20:08.387127image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
014189538
96.0%
4.8547869
 
3.7%
10.2511614
 
0.1%
2.25582
 
< 0.1%
8.254832
 
< 0.1%
9.63296
 
< 0.1%
6.51111
 
< 0.1%
14.4895
 
< 0.1%
15.05878
 
< 0.1%
5458
 
< 0.1%
Other values (492)10542
 
0.1%
ValueCountFrequency (%)
014189538
96.0%
0.01175
 
< 0.1%
0.0216
 
< 0.1%
0.038
 
< 0.1%
0.0439
 
< 0.1%
0.0510
 
< 0.1%
0.0611
 
< 0.1%
0.087
 
< 0.1%
0.098
 
< 0.1%
0.19
 
< 0.1%
ValueCountFrequency (%)
2042
< 0.1%
19.8581
< 0.1%
19.84
 
< 0.1%
19.754
 
< 0.1%
19.652
 
< 0.1%
19.551
 
< 0.1%
19.511
 
< 0.1%
19.452
 
< 0.1%
19.41
 
< 0.1%
19.252
 
< 0.1%

total_amount
Real number (ℝ)

High correlation 

Distinct8695
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.952985
Minimum2.5
Maximum650
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size112.7 MiB
2025-10-28T12:20:08.453448image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum2.5
5-th percentile5.4
Q17.7
median10.5
Q315.5
95-th percentile36.8
Maximum650
Range647.5
Interquartile range (IQR)7.8

Descriptive statistics

Standard deviation11.464686
Coefficient of variation (CV)0.82166547
Kurtosis36.258167
Mean13.952985
Median Absolute Deviation (MAD)3.5
Skewness3.8875555
Sum2.0617789 × 108
Variance131.43902
MonotonicityNot monotonic
2025-10-28T12:20:08.517595image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.5564079
 
3.8%
9555271
 
3.8%
8546099
 
3.7%
7544153
 
3.7%
7.5531046
 
3.6%
6515046
 
3.5%
9.5497237
 
3.4%
8.5480776
 
3.3%
10432154
 
2.9%
5.5405590
 
2.7%
Other values (8685)9705164
65.7%
ValueCountFrequency (%)
2.552
 
< 0.1%
2.551
 
< 0.1%
2.62
 
< 0.1%
2.691
 
< 0.1%
2.71
 
< 0.1%
2.751
 
< 0.1%
2.81
 
< 0.1%
2.95
 
< 0.1%
327854
0.2%
3.0153
 
< 0.1%
ValueCountFrequency (%)
6502
 
< 0.1%
508.251
 
< 0.1%
500.51
 
< 0.1%
5007
< 0.1%
4941
 
< 0.1%
479.41
 
< 0.1%
476.661
 
< 0.1%
4754
< 0.1%
470.57
< 0.1%
469.51
 
< 0.1%

Interactions

2025-10-28T12:19:28.570877image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:11.082672image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:15.926450image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:20.224589image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:24.379144image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:29.473221image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:12.080012image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:16.671189image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:21.031261image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:25.166289image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:30.345831image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:12.913170image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:17.627595image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:21.722207image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:25.960321image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:31.200733image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:13.756748image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:18.522843image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:22.732832image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:26.614819image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:32.032512image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:14.647126image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:19.412409image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:23.598966image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2025-10-28T12:19:27.436662image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Correlations

2025-10-28T12:20:08.565523image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
fare_amountmta_taxpayment_typesurchargetip_amounttolls_amounttotal_amountvendor_id
fare_amount1.0000.5610.010-0.0040.3200.3190.9750.004
mta_tax0.5611.0000.0350.0080.1460.5640.3710.007
payment_type0.0100.0351.0000.0080.0090.0290.0360.059
surcharge-0.0040.0080.0081.0000.030-0.0760.0790.004
tip_amount0.3200.1460.0090.0301.0000.1530.4720.003
tolls_amount0.3190.5640.029-0.0760.1531.0000.3270.022
total_amount0.9750.3710.0360.0790.4720.3271.0000.007
vendor_id0.0040.0070.0590.0040.0030.0220.0071.000

Missing values

2025-10-28T12:19:33.268970image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.
2025-10-28T12:19:40.401921image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

medallionhack_licensevendor_idpickup_datetimepayment_typefare_amountsurchargemta_taxtip_amounttolls_amounttotal_amount
089D227B655E5C82AECF13C3F540D4CF4BA96DE419E711691B9445D6A6307C170CMT2013-01-01 15:11:48CSH6.50.00.50.00.07.0
10BD7C8F5BA12B88E0B67BED28BEA73D89FD8F69F0804BDB5549F40E9DA1BE472CMT2013-01-06 00:18:35CSH6.00.50.50.00.07.0
20BD7C8F5BA12B88E0B67BED28BEA73D89FD8F69F0804BDB5549F40E9DA1BE472CMT2013-01-05 18:49:41CSH5.51.00.50.00.07.0
3DFD2202EE08F7A8DC9A57B02ACB81FE251EE87E3205C985EF8431D850C786310CMT2013-01-07 23:54:15CSH5.00.50.50.00.06.0
4DFD2202EE08F7A8DC9A57B02ACB81FE251EE87E3205C985EF8431D850C786310CMT2013-01-07 23:25:03CSH9.50.50.50.00.010.5
520D9ECB2CA0767CF7A01564DF2844A3E598CCE5B9C1918568DEE71F43CF26CD2CMT2013-01-07 15:27:48CSH9.50.00.50.00.010.0
6496644932DF3932605C22C7926FF0FE0513189AD756FF14FE670D10B92FAF04CCMT2013-01-08 11:01:15CSH6.00.00.50.00.06.5
70B57B9633A2FECD3D3B1944AFC7471CFCCD4367B417ED6634D986F573A552A62CMT2013-01-07 12:39:18CSH34.00.00.50.04.839.3
82C0E91FF20A856C891483ED63589F9821DA2F6543A62B8ED934771661A9D2FA0CMT2013-01-07 18:15:47CSH5.51.00.50.00.07.0
92D4B95E2FA7B2E85118EC5CA4570FA58CD2F522EEE1FF5F5A8D8B679E23576B3CMT2013-01-07 15:33:28CSH13.00.00.50.00.013.5
medallionhack_licensevendor_idpickup_datetimepayment_typefare_amountsurchargemta_taxtip_amounttolls_amounttotal_amount
14776605A8262FA0AFCB6C7229F6888EAFBDE0761BDF89260FEF1AE6FDDE839A0278D31DCMT2013-01-07 07:29:06CSH52.00.00.50.04.857.3
14776606A8262FA0AFCB6C7229F6888EAFBDE0761BDF89260FEF1AE6FDDE839A0278D31DCMT2013-01-07 14:30:23CSH9.50.00.50.00.010.0
14776607F33EF464441839C6F0DABAABBC93B45D313F66DD09C308EADA3B307F6B8CF7A9CMT2013-01-10 10:56:47CSH7.50.00.50.00.08.0
1477660856CE01E7DBE0E6449FA1758F082D88844C6FE2FCFED26629D515D291EC1516A0CMT2013-01-10 14:50:01CSH20.00.00.50.00.020.5
1477660932201027CDC62D654DC3AD9747A07C96B8DDB9F8143017E22104050B26C2A65DCMT2013-01-05 08:58:18CSH10.50.00.50.00.011.0
14776610B33E71CD9E8FE1BE3B70FEB6E807DD15BAF57796E45D921BB23217E17A372FF6CMT2013-01-06 04:58:23CSH13.00.50.50.00.014.0
14776611ED160B76D5349C8AC1ECF22CD4B8D5383B93F6DA5DEBDE9560993FA624C4FF76CMT2013-01-08 14:42:04CSH7.50.00.50.00.08.0
14776612D83F9AC0E33F6F19869C243BE6AB6FE585A55B6772275374EF90AC9457DC1F83CMT2013-01-10 13:29:23CSH6.00.00.50.00.06.5
1477661304E59442A7DDBCE515E33CD355D866E77913172189931A1A1632562B10AB53C4CMT2013-01-06 16:30:15CSH9.50.00.50.00.010.0
14776614D30BED60331C79E3F7ACD05B325ED42FB5E1D2461A5BCC8819188DACEC17CD69CMT2013-01-05 20:38:46CSH5.00.50.50.00.06.0